PltRNAdb: Plant transfer RNA database

Transfer RNAs (tRNAs) are intermediate-sized non-coding RNAs found in all organisms that help translate messenger RNA into protein. Recently, the number of sequenced plant genomes has increased dramatically. The availability of this extensive data greatly accelerates the study of tRNAs on a large scale. Here, 8,768,261 scaffolds/chromosomes containing 229,093 giga-base pairs representing whole-genome sequences of 256 plant species were analyzed to identify tRNA genes. As a result, 331,242 nuclear, 3,216 chloroplast, and 1,467 mitochondrial tRNA genes were identified. The nuclear tRNA genes include 275,134 tRNAs decoding 20 standard amino acids, 1,325 suppressor tRNAs, 6,273 tRNAs with unknown isotypes, 48,475 predicted pseudogenes, and 37,873 tRNAs with introns. Efforts also extended to the creation of PltRNAdb (https://bioinformatics.um6p.ma/PltRNAdb/index.php), a data source for tRNA genes from 256 plant species. PltRNAdb website allows researchers to search, browse, visualize, BLAST, and download predicted tRNA genes. PltRNAdb will help improve our understanding of plant tRNAs and open the door to discovering the unknown regulatory roles of tRNAs in plant genomes.


Introduction
Transfer RNAs (tRNAs) are intermediate-sized non-coding RNA genes discovered in all organisms that help in the translation of mRNA into protein [1]. tRNAs are found in all types of cells and organelles and are involved in several cellular processes, including viral replication, amino acid biosynthesis, and cell wall remodeling [2,3]. In plants, tRNA undergoes a posttranscriptional process to obtain the mature form required for its function [4]. Recently, Hummel et al. [5] reported a variety of cell biological processes that are affected by the organization, expression, and modification of tRNA genes. These modifications are a source of novel biological functions of tRNAs in plants [6].
Due to the increasing number of plant genomes, we have developed PltRNAdb, a freely available database of tRNA genes from 256 plant species. The PltRNAdb database contains the details of identified tRNAs in the nuclear genome and its available organellar genomes as follows: 1) tRNA sequences, 2) tRNA secondary structure visualization, 3) tRNAs upstream and downstream sequences, 4) tRNAs with introns, 5) tRNAs decoding 20 standard amino acids, 6) possible suppressor tRNAs, 7) tRNAs with unknown isotypes, 8) predicted tRNA pseudo-genes. We hope that by pooling such extensive data into one database, we can improve our understanding of plant tRNAs and open the door to discovering the unknown regulatory roles of tRNAs in plant genomes.

Genomic data
We retrieved FASTA files of sequenced and annotated nuclear genomes for 256 plant species from the NCBI database (https://www.ncbi.nlm.nih.gov/). In addition, we retrieved the available organellar genomes of those species, including 100 chloroplast and 52 mitochondrial genomes. The 256 plant genomes include 229 Streptophyta, 24 Chlorophyta, and 3 Rhodophyta ( Table 2). The details of the studied species are listed in S1 Table, including plant name, NCBI taxid, assembly type and level, sequence representation and coverage, and sequence category and accession numbers.

Prediction of tRNA genes
tRNAscan-SE v.2.0.9 [21] was used in the present study for the prediction of tRNAs in the studied plant genomes. For nuclear genomes, the parameters were set to: Search Mode: Eukaryotic, Searching with: Infernal first pass, Isotype-specific model scan: yes, Covariance model: TRNAinf-euk.cm, Infernal first pass cutoff score: 10, and Temporary directory: tmp. For chloroplast and mitochondrial genomes, the parameters were set as follows: Search Mode: (Organellar), Searching with: (Infernal single-pass; scan Maximum sensitivity mode), Covariance model: (TRNAinf-1415.cm), Cutoff score: (15).

Database construction
The PltRNAdb database was created using Apache 2.4.41, MySQL 8.0.27, PHP 7.4.3., Perl 5.30.0, Python 3.8.10, and the D3 library. The interactive web interface was designed using PHP, CSS, HTML, and JavaScript. The workflow for identifying the tRNA and creating the PltRNAdb is shown in (Fig 1). Data-Driven-Documents (D3.js, https://github.com/d3) was implemented in our PltRNAdb to visualize the secondary structure of all predicted tRNAs.

Results and discussion
Recently, the number of sequenced plant genomes has increased due to advances in genome sequencing. This large number of sequenced genomes requires bioinformatics tools to extract various features and make them available in various databases. For plants, almost half of the

Prediction tRNAs
Here, 8,768,261 scaffolds/chromosomes with a total length of 229,093 giga base pairs representing nuclear, chloroplast and mitochondrial genome sequences of the studied plant species were analyzed to identify the tRNA genes. As a result, 331,242, 3,216, and 1,467 tRNA genes were identified from nuclear, chloroplast and mitochondrial genomes, respectively (Table 3). Fig 2 shows bar charts for the total number of nuclear tRNAs decoding 20 standard amino acids, suppressor tRNAs, tRNAs with unknown isotypes, and predicted pseudo-genes for each genome.
To date, further efforts have been made to predict plant tRNA genes and make them available by building web databases. Several databases have been created using tRNAscan, including GtRNAdb [21], tRNAdb [16], tRNADB-CE [18], and PlantRNA [19]. GtRNAdb contains 30,061 predicted tRNA genes derived from 15 plant species, whereas tRNAdb contains 702 tRNA genes derived from 58 plant species. In addition, tRNADB-CE contains 1,352 tRNA genes derived from 2 plant species, while PlantRNA database contains 66,686 genes derived from 47 plant species. Table 4 compares our database with previously developed databases (GtRNAdb and PlantRNA). This comparison includes only the sequenced and annotated plant species (38 species) shared by the compared databases. The comparison includes the number of predicted tRNA genes and the number of tRNAs with introns. The species name, total number of predicted tRNA genes, and number of tRNAs with introns of 218 plant species available only through the current database were listed in S2 Table.

The PltRNAdb database
The Plant tRNA database (PltRNAdb) was created as a data resource for the tRNA genes of 256 plant species. PltRNAdb was developed using several programming languages, including MySQL, PHP, Perl, Python, D3 library, CSS, HTML, and JavaScript. On the PltRNAdb website, researchers can search, browse, visualize, BLAST, and download predicted tRNA genes. Using the links in the main bar of the homepage, researchers can switch between database pages, including database search, quick access, resources, general statistics, BLAST, and bulk download pages.
The PltRNAdb search page offers researchers the ability to dive deep into the database and retrieve tRNA data in two steps. The first step is to select the plant species and the second step is to select the nuclear, chloroplast, or mitochondrial genome and the tRNA type. The tRNA types include Ala, Gly, Pro, Thr, Val, Arg, Leu, Phe, Asn, Asp, Glu, His, Ile, Met/iMet, Tyr, Supres, Cys, Ser, Trp, SelCys, Gln, Lys, and Undet. The results are displayed on the new page with the available details of the tRNA genes. The results page is divided into two subsections. The first is used to display statistical plots of the identified tRNAs in the species searched. The second section contains details such as tRNA sequence ID, chromosome/scaffold accession number, sequence start and end within the chromosome/scaffold, tRNA sequence, tRNA secondary structure visualization, tRNA type, anticodon, intron start, intron end, score, and notes. The tRNA secondary structure button leads to a separate page with details of the selected tRNA, including tRNA secondary structure image, tRNA type, anticodon, tRNA length,

PLOS ONE
upstream and downstream sequence, and tRNA sequence. The results can be downloaded using the Download button at the top of the Results page. The general statistics page of PltRNAdb offers researchers the ability to take a close look at all available statistics for their selected species. Researchers can select plant species from the drop-down menu using the scientific name of the plant. Summary statistics for the selected species include statistical charts of identified tRNAs and the summary table for the nuclear Table 4 genome. In addition, the statistics of chloroplast and mitochondrial tRNAs when available. On the Bulk Download page, researchers can download all data for selected plant species. They can download the data in different formats, including the FASTA file of tRNAs, the identified tRNA details in a tabular format, and the statistics file for each genome separately (nuclear, chloroplast, or mitochondrial genome) (S1 Fig). BLASTN is embedded in PltRNAdb for tRNAs DNA sequence comparisons. BLASTN allows researchers to quickly align their sequence to the tRNA sequences of 256 plant species. Researchers can blast their FASTA sequence against one of the 256 plant species. The results table includes the subject ID, query ID, identity, length, mismatch, gaps, query start and end, subject start and end, E-value, and blast score (S2 Fig).

Case study: Arabidopsis thaliana tRNA genes
In the present study, we select Arabidopsis thaliana to show the user how to navigate PltRNAdb. Due to the high quality of the genome sequence and annotation of Arabidopsis thaliana, this case study also serves as a comparison between the current findings and the genome annotation provided by NCBI. In PltRNAdb, 642, 33, and 36 tRNA genes were detected in the nuclear, chloroplast, and mitochondrial genomes of Arabidopsis thaliana, respectively. In the NCBI genome database, a total of 623 tRNA genes were found in the annotation of Arabidopsis thaliana. Based on the location of the tRNA genes in the genome sequence, the 623 NCBI tRNA genes were compared with the 642 tRNA genes identified in the current study. Of the 623 NCBI tRNA genes, 622 match the current finding and only one NCBI tRNA gene has no match. The 20 tRNA genes from the current finding that were not present in the NCBI tRNA genes were 2 Lys, 2 Leu, 2 Glu, 1 Tyr, 1 Cys, 1 Arg, 1 Met, 1 Asp, 2 undetermined, and 7 pseudo-genes. The common tRNA genes, the tRNA genes unique to NCBI, and the tRNA genes unique to the current analysis are listed in S3 Table. PltRNAdb includes searching, browsing, visualization, and downloading functionalities. The search page can be accessed via Database Search in the top bar of each page. First, select Arabidopsis thaliana from the plant species drop-down menu and click the Search button. The statistical charts of Arabidopsis thaliana tRNA genes are displayed on the same page. Second, select the nuclear, chloroplast, or mitochondrial genome and the tRNA type from the genome and tRNA Type drop-down menus (Fig 3).
The search results are displayed on a separate page with the statistical charts of the tRNA genes subjected to the search parameters and a table with some details about the tRNA genes. Users can download the search results using the Download button at the top of the results table or download only the FASTA sequence for a tRNA gene by clicking the Download button in the tRNA Sequence column (Fig 4). Users can also access the details of the selected tRNA by clicking the View button in the Secondary Structure column. This page displays the details and image of the tRNA secondary structure of the selected tRNA as well as the download button ( Fig 5).
The general statistics page can be accessed by clicking the General Statistics button in the top bar of any page. This page is divided into three subsections. The first is for selecting Arabidopsis thaliana from the plant species dropdown menu. The second section contains the statistical charts of the total tRNA genes of Arabidopsis thaliana (nuclear, chloroplast, mitochondria) and a bar chart for the tRNA types. The third section contains statistical tables with nuclear, chloroplast, and mitochondrial values. The statistical tables include the total number of predicted tRNA genes, tRNAs decoding 20 standard amino acids, selenocysteine tRNAs (TCA), possible suppressor tRNAs, tRNAs with unknown isotypes, predicted pseudogenes, tRNAs with introns, and the number of isotypes/anticodons (Fig 6).

Conclusion and future work
PltRNAdb is a database of tRNA genes, predicted by tRNAScan [8], for 256 plant species. Various tools and programming languages were used for visualize tRNA secondary structure, and build the database. PltRNAdb will be regularly updated with new annotated genomes and improve its tools to serve its purpose. Although PltRNAdb focuses on the prediction of tRNA genes in fully sequenced and annotated genomes, we plan to add a subsection for incomplete/ unannotated plant genomes to the database to bring all available species together in one database. PltRNAdb will be an excellent resource for researchers interested in tRNAs research areas. We hope that PltRNAdb will improve our understanding of plant tRNAs and open the door to discovering the unknown regulatory roles of tRNAs in plant genomes. Supporting information S1 Table. List of plant names, NCBI taxid, assembly type and level, sequence representation, coverage, sequence category and accession numbers for all species examined.